Explorer Cooperative Caching for GPUs

نویسندگان

Saumay Dublish

Vijay Nagarajan

Nigel Topham

چکیده

General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. The rise of general-purpose computing on GPUs has influenced architectural innovation on GPUs. The introduction of an on-chip cache hierarchy is one such innovation. High L1 miss rates on GPUs, however, indicate inefficient cache usage due to myriad factors such as cache thrashing and extensive multithreading. Such high L1 miss rates in turn place high demands on the shared L2 bandwidth. Extensive congestion in the L2 access path, therefore, results in high memory access latencies. In memory-intensive applications, these latencies get exposed due to a lack of active compute threads to mask such high latencies. In this paper, we aim to reduce the pressure on the shared L2 bandwidth, thereby reduce the memory access latencies that lie in the critical path. We identify significant replication of data among private L1 caches, presenting an opportunity to reuse data among the L1s. We further show how this reuse can be exploited via an L1 Cooperative Caching Network (CCN), thereby reducing the bandwidth demand on the L2. In the proposed architecture, we connect the L1 caches with a lightweight ring network to facilitate inter-core communication of shared data. We show that this technique reduces traffic to the L2 cache by an average of 29%, freeing up the bandwidth for other accesses. We also show that CCN reduces the average memory latency by 24%, thereby reducing core stall cycles by 26% on average. This translates into an overall performance improvement of 14.7% on average (and up to 49%) for applications that exhibit reuse across L1 caches. In doing so, CCN incurs a nominal area and energy overhead of 1.3% and 2.5% respectively. Notably, the performance improvement with our proposed CCN compares favourably to the performance improvement achieved by simply doubling the number of L2 banks by up to 34%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cooperative Caching on Mobile Devices for Location Dependent Queries

Location dependent services are expected to become a very important revenue stream in the future. To enhance performance of location dependent queries there have been several research efforts in the past. Most of these have been focused on client side caching with extremely little work on cooperative caching. Cooperative caching might particularly be a very good solution to improve performance ...

متن کامل

Cooperative caching: The case for P2P traffic

This paper analyzes the potential of cooperative proxy caching for peer-to-peer (P2P) traffic as a means to ease the burden imposed by P2P traffic on Internet service providers (ISPs). In particular, we propose two models for cooperative caching of P2P traffic. The first model enables cooperation among caches that belong to different autonomous systems (ASes), while the second considers coopera...

متن کامل

Fragment Reconstruction: Providing Global Cache Coherence in a Transactional Storage System

Cooperative caching is a promising technique to avoid the increasingly formidable disk bottleneck problem in distributed storage systems; it reduces the number of disk accesses by servicing client cache misses from the caches of other clients. However, existing cooperative caching techniques do not provide adequate support for fine-grained sharing. In this paper, we describe a new storage syste...

متن کامل

Influence of the Document Validation/Replication Methods on Cooperative Web Proxy Caching Architectures

Nowadays cooperative web caching has shown to improve the performance in Web document access. That is why the interest in works related to web caching architectures designs has been increasing. This paper discusses and compares performances of some cooperative web caching designs (hierarchy, mesh, hybrid) using different document validation/replication methods (TTL, invalidation, pushing, etc)....

متن کامل

An Energy Conserving Cooperative Caching Policy for Ad Hoc Networks

Without pre-existing infrastructure, a Mobile Ad Hoc Network (MANET) can be easily deployed in a hostile environment such as a military operation area or an area undergoing disaster recovery. With limited bandwidth, the data caching technique may be useful to enhance the performance of a MANET. The cooperative caching scheme is shown to be suitable for MANETs. With cooperative caching schemes, ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Explorer Cooperative Caching for GPUs

نویسندگان

چکیده

منابع مشابه

Cooperative Caching on Mobile Devices for Location Dependent Queries

Cooperative caching: The case for P2P traffic

Fragment Reconstruction: Providing Global Cache Coherence in a Transactional Storage System

Influence of the Document Validation/Replication Methods on Cooperative Web Proxy Caching Architectures

An Energy Conserving Cooperative Caching Policy for Ad Hoc Networks

عنوان ژورنال:

اشتراک گذاری